pic 50
Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models
Significant advances have been made in the in silico prediction of molecular and pharmacokinetic properties associated with successful drug-like molecules (Leeson et al., 2021; Lombardo et al., 2017). These cheminformatics advances have laid the foundation for further enhancements in drug candidate screening, prioritization for advancement into in vivo studies, and clinical candidate selection (Maurer et al., 2021). Despite these impressive improvements in molecular property predictions, a considerable challenge remains in accurately predicting the affinity/potency of a ligand-protein interaction (LPI), also known as a drug-target interaction (DTI) (Yamanishi et al., 2008). Drugs convey their phenotypic effects through interactions with a variety of biological targets with varying affinities (Swinney & Anthony, 2011). Some interactions produce desirable outcomes and phenotypes, while others can create undesired side effects and/or safety risks (Waring et al., 2015). Accurately predicting the affinities of ligand-protein interactions would enable drug discovery teams to better design and prioritize the synthesis of molecules that interact with intended protein targets, while minimizing undesired interactions with off-targets like hERG and liver enzymes, ultimately increasing the chances of preclinical success.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Monaco (0.04)
- Europe > Germany > Rheinland-Pfalz > Mainz (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
PILOT: Equivariant diffusion for pocket conditioned de novo ligand generation with multi-objective guidance via importance sampling
Cremer, Julian, Le, Tuan, Noé, Frank, Clevert, Djork-Arné, Schütt, Kristof T.
The generation of ligands that both are tailored to a given protein pocket and exhibit a range of desired chemical properties is a major challenge in structure-based drug design. Here, we propose an in-silico approach for the $\textit{de novo}$ generation of 3D ligand structures using the equivariant diffusion model PILOT, combining pocket conditioning with a large-scale pre-training and property guidance. Its multi-objective trajectory-based importance sampling strategy is designed to direct the model towards molecules that not only exhibit desired characteristics such as increased binding affinity for a given protein pocket but also maintains high synthetic accessibility. This ensures the practicality of sampled molecules, thus maximizing their potential for the drug discovery pipeline. PILOT significantly outperforms existing methods across various metrics on the common benchmark dataset CrossDocked2020. Moreover, we employ PILOT to generate novel ligands for unseen protein pockets from the Kinodata-3D dataset, which encompasses a substantial portion of the human kinome. The generated structures exhibit predicted $IC_{50}$ values indicative of potent biological activity, which highlights the potential of PILOT as a powerful tool for structure-based drug design.
- North America > United States > New York > New York County > New York City (0.14)
- Europe > Italy > Piedmont > Turin Province > Turin (0.04)
Materials Discovery with Extreme Properties via AI-Driven Combinatorial Chemistry
Kim, Hyunseung, Choi, Haeyeon, Kang, Dongju, Lee, Won Bo, Na, Jonggeol
The goal of most materials discovery is to discover materials that are superior to those currently known. Fundamentally, this is close to extrapolation, which is a weak point for most machine learning models that learn the probability distribution of data. Herein, we develop AI-driven combinatorial chemistry, which is a rule-based inverse molecular designer that does not rely on data. Since our model has the potential to generate all possible molecular structures that can be obtained from combinations of molecular fragments, unknown materials with superior properties can be discovered. We theoretically and empirically demonstrate that our model is more suitable for discovering better materials than probability distribution-learning models. In an experiment aimed at discovering molecules that hit seven target properties, our model discovered 1,315 of all target-hitting molecules and 7,629 of five target-hitting molecules out of 100,000 trials, whereas the probability distribution-learning models failed. To illustrate the performance in actual problems, we also demonstrate that our models work well on two practical applications: discovering protein docking materials and HIV inhibitors.
- Research Report (1.00)
- Workflow (0.93)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
- Health & Medicine > Therapeutic Area > Immunology > HIV (0.36)
Decoding the Protein-ligand Interactions Using Parallel Graph Neural Networks
Knutson, Carter, Bontha, Mridula, Bilbrey, Jenna A., Kumar, Neeraj
Protein-ligand interactions (PLIs) are fundamental to biochemical research and their identification is crucial for estimating biophysical and biochemical properties for rational therapeutic design. Currently, experimental characterization of these properties is the most accurate method, however, this is very time-consuming and labor-intensive. A number of computational methods have been developed in this context but most of the existing PLI prediction heavily depends on 2D protein sequence data. Here, we present a novel parallel graph neural network (GNN) to integrate knowledge representation and reasoning for PLI prediction to perform deep learning guided by expert knowledge and informed by 3D structural data. We develop two distinct GNN architectures, GNNF is the base implementation that employs distinct featurization to enhance domain-awareness, while GNNP is a novel implementation that can predict with no prior knowledge of the intermolecular interactions. The comprehensive evaluation demonstrated that GNN can successfully capture the binary interactions between ligand and proteins 3D structure with 0.979 test accuracy for GNNF and 0.958 for GNNP for predicting activity of a protein-ligand complex. These models are further adapted for regression tasks to predict experimental binding affinities and pIC50 is crucial for drugs potency and efficacy. We achieve a Pearson correlation coefficient of 0.66 and 0.65 on experimental affinity and 0.50 and 0.51 on pIC50 with GNNF and GNNP, respectively, outperforming similar 2D sequence-based models. Our method can serve as an interpretable and explainable artificial intelligence (AI) tool for predicted activity, potency, and biophysical properties of lead candidates. To this end, we show the utility of GNNP on SARS-Cov-2 protein targets by screening a large compound library and comparing our prediction with the experimentally measured data.
- North America > United States > Washington > Benton County > Richland (0.04)
- North America > United States > Virginia (0.04)
- Asia > China (0.04)
Benchmarking Deep Graph Generative Models for Optimizing New Drug Molecules for COVID-19
Ward, Logan, Bilbrey, Jenna A., Choudhury, Sutanay, Kumar, Neeraj, Sivaraman, Ganesh
Design of new drug compounds with target properties is a key area of research in generative modeling. We present a small drug molecule design pipeline based on graph-generative models and a comparison study of two state-of-the-art graph generative models for designing COVID-19 targeted drug candidates: 1) a variational autoencoder-based approach (VAE) that uses prior knowledge of molecules that have been shown to be effective for earlier coronavirus treatments and 2) a deep Q-learning method (DQN) that generates optimized molecules without any proximity constraints. We evaluate the novelty of the automated molecule generation approaches by validating the candidate molecules with drug-protein binding affinity models. The VAE method produced two novel molecules with similar structures to the antiretroviral protease inhibitor Indinavir that show potential binding affinity for the SARS-CoV-2 protein target 3-chymotrypsin-like protease (3CL-protease).
Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Errors for Deep Neural Networks
Cortes-Ciriano, Isidro, Bender, Andreas
Deep learning architectures have proved versatile in a number of drug discovery applications, including the modelling of in vitro compound activity. While controlling for prediction confidence is essential to increase the trust, interpretability and usefulness of virtual screening models in drug discovery, techniques to estimate the reliability of the predictions generated with deep learning networks remain largely underexplored. Here, we present Deep Confidence, a framework to compute valid and efficient confidence intervals for individual predictions using the deep learning technique Snapshot Ensembling and conformal prediction. Specifically, Deep Confidence generates an ensemble of deep neural networks by recording the network parameters throughout the local minima visited during the optimization phase of a single neural network. This approach serves to derive a set of base learners (i.e., snapshots) with comparable predictive power on average, that will however generate slightly different predictions for a given instance. The variability across base learners and the validation residuals are in turn harnessed to compute confidence intervals using the conformal prediction framework. Using a set of 24 diverse IC50 data sets from ChEMBL 23, we show that Snapshot Ensembles perform on par with Random Forest (RF) and ensembles of independently trained deep neural networks. In addition, we find that the confidence regions predicted using the Deep Confidence framework span a narrower set of values. Overall, Deep Confidence represents a highly versatile error prediction framework that can be applied to any deep learning-based application at no extra computational cost.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Highly Scalable Tensor Factorization for Prediction of Drug-Protein Interaction Type
Arany, Adam, Simm, Jaak, Zakeri, Pooya, Haber, Tom, Wegner, Jörg K., Chupakhin, Vladimir, Ceulemans, Hugo, Moreau, Yves
The understanding of the type of inhibitory interaction plays an important role in drug design. Therefore, researchers are interested to know whether a drug has competitive or non-competitive interaction to particular protein targets. Method: to analyze the interaction types we propose factorization method Macau which allows us to combine different measurement types into a single tensor together with proteins and compounds. The compounds are characterized by high dimensional 2D ECFP fingerprints. The novelty of the proposed method is that using a specially designed noise injection MCMC sampler it can incorporate high dimensional side information, i.e., millions of unique 2D ECFP compound features, even for large scale datasets of millions of compounds. Without the side information, in this case, the tensor factorization would be practically futile. Results: using public IC50 and Ki data from ChEMBL we trained a model from where we can identify the latent subspace separating the two measurement types (IC50 and Ki). The results suggest the proposed method can detect the competitive inhibitory activity between compounds and proteins.